Sign In

Watch us on Youtube

Turn on more accessible mode

Turn off more accessible mode

Skip Ribbon Commands

Skip to main content

NATO Science & Technology Organization

NATO STO

STO-Activities : (no title)

Help (new window)

Activity title	Using Simulation to Train AI for Automated Scene Understanding
Activity Reference	SET-343
Panel	SET
Security Classification	Other
Status	Proposed
Activity type	RTG
Start date	2024-09-01T00:00:00Z
End date	2027-09-01T00:00:00Z
Keywords	Adaptability, Artificial Intelligence, Deep Learning, Reality Gap, SET, Simulation
Background	In the commercial industry, AI models have achieved notable successes in automated scene understanding. From enhancing security through intelligent surveillance systems to advancing automated driving, these models play a key role in diverse applications. However, the successes of industry have not yet been leveraged to address the military problem because relevant military data needed for training models does not exist in sufficient quantity and is typically not available or of interest to industry leaders. It is very hard to collect sufficient representative samples of military relevant data, for example when it comes to using other imagery such as thermal infra-red, or for scenarios in inaccessible terrain. For many military use cases such training datasets are not available. Often acquisition and labelling of such datasets is too expensive, and in the case of very rare events, or during system development, impossible as the data cannot be recorded. As emphasized by the SET-272 RTG ‘Using AI for Automated Scene Understanding’, a solution to the lack of access to sufficient real training data is to simulate this data. Potentially this would lead to high quality AI models, since with simulations we can, in principle, generate large representative datasets for any scenario of interest. In many fields NATO already has sophisticated models of the application, scenarios, phenomena, apparatus and sensors; exploiting this knowledge should provide a good path to generate high quality AI models. However, despite promising attempts to use simulation for training data, the members of ET-134 identified that currently AI-models trained on simulated data do not necessarily perform well on real world data. To bridge this gap, this RTG aims to study simulation variation, image realism, and underlying physics to obtain simulated data that enables strong AI model development. Another goal is to determine what AI (training) methods work well when using synthetic data.
Objectives	- Research how much and what kind of variation in a simulated dataset is required for training an AI-model that performs well on real data. - Research the trade-off between simulating datasets with a high level of physical realism versus datasets with more samples with increased variation. - Investigate AI methods and training procedures that work well with simulated data. - Determine how to effectively combine AI model training with real and simulated data.
Topics	- Requirements in terms of how much variation a simulated dataset should contain. This includes the variation in physical modelling (e.g. in sensor or atmospheric modelling), image space variations (e.g. camera-object distance, rotation of objects), and variation in the scene content (whether the variations in the simulation match the variations expected in real military scenarios). - The trade-off between using more expensive simulators, in terms of complexity and computational power, that generate data with a higher level of physical realism versus using a simulated dataset with less physical realism but a higher number of samples with more variation. - Metrics or instructions to evaluate the quality of synthetic data for specific scenarios. Currently used metrics include mutual information, cross entropy, PSNR, and SSIM but other possibilities can be explored. - The use of real data in combination with synthetic data. This should include what a recommended mixing ratio is when using both and what a recommended training strategy is, i.e. whether to first use synthetic data to train and consecutively finetune with real data, vice versa, or to combine the real and synthetic data in one dataset. - Generative AI models including GANs and diffusion models to simulate datasets and how these models compare to the use of physical models. - Novel AI-model (e.g. transformers, foundation models) and training strategies (e.g. progressive unfreezing) that work well when training with simulated data.
	Contact Panel Office